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Similarity of protein encoded 
by the human z-erb-B-2 gene 
to epidermal growth factor receptor 
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A novel y-erb-B- related gene, c-erb-B-2, which has been identified 
in the human genome 1 * 2 , maps to human chromosome 17 at q21 
(ref. 40), and seems to encode a polypeptide with a kinase domain 
that is highly homologous with, but distinct from, that of the 
epidermal growth factor (EGF) receptor 1 . The c-erb-B-2 gene is 
conserved in vertebrates and it has been suggested 1 that the neu 
gene, detected in a series of rat neuro/glioblastomas 3 , is, in fact, 
the rat c-erb-B-2 gene. Amplification of the c-erb-B-2 gene in a 
salivary adenocarcinoma and a gastric cancer cell line MKN-7 
suggests that its over-expression is sometimes involved in the 
neoplastic process. To determine the nature of the c-erb-B-2 pro- 
tein, we have now molecularly cloned complementary DNA for 
c-erb-B-2 messenger RNA prepared from MKN-7 cells. Its 
sequence shows that the c-erb-B-2 gene encodes a possible receptor 
protein and allows an analysis of the similarity of the protein to 
the EGF receptor and the neu product. As a consequence of 
chromosomal aberration in MKN-7 cells, a 4.6-kiIobase (kb) 
normal transcript and a truncated 23-kb transcript of c-erb-B-2 
are synthesized at elevated levels. The latter transcript presumably 
encodes only the extracellular domain of the putative receptor. 

Poly(A) + RNA was prepared from MKN-7 cells, a gastric 
cancer cell line in which the c-erb-B-2 gene is amplified 40 and 
over-expressed (see below). A cDNA library constructed from 
the MKN-7 mRNA was screened initially with a 440-base pair 
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Fig. 1 Analysis of c-erb-B-2 clones. A, Restriction maps of c-erb-B-2 clon. 
B, Northern analysis of MKN-7 and placental mRNAs. A t Maps w» 
constructed by the standard procedure for restriction digestion analysis 
plasmid DNA . Single or double digestion with restriction endonuclea- 
Kpnl(x), £coRIO), BamHl (•), Pvull (|), Smal (*) and Accl (O) » 
performed. The Accl site at position 2.2 kb in this map is absent from t 
pCER217 insert but pCER237 and pCER204 were cleaved with the enzyi 
at this position. Sequence data reveal that pCER217 carries a deletion 
42 bp in this region (see Fig. 2 legend). The exon sequence of the 440- 
KX DNA prepared from the genomic clone A 107 (ref. 1) corresponds t< 
150-bp EcoKUKpnl fragment of pCER2l7 (3' probe). A 450-bp Pvu 
BamHl fragment of pCER217 (middle probe) and a 650-bp Smal-Pvi 
fragment of pCER235 (5' probe) were used for the second and third scret 
ings, respectively, of the MKN-7 cDNA library. Total RNA was prepar 
from MKN-7 cells by the guanidine isothiocyanate-caesium chlori 
method . Poly(A) RNA was selected by two cycles of o!igo(dT)-cellulc 
column chromatography 9 . A cDNA library was constructed by the meth 
of Okayama and Berg , using 5.4 ^g poly(A)~RNA and 2.8 ^g vector-prin 
DNA. (Avian myeloblastosis virus reverse transcriptase was from Dr J 
Beard.) Escherichia coii MC106I (ref.31) was used for transformation 
Ampicilhn-resistant transformants (60,000 independent clones) were fi 
screened by hybridization at 60 °C for 16 h in the solution described p 
viously with the 440-bp KX DNA fragment. Among 21 distinctive posir 
clones, pCER2l7 carried the longest insert of 2.8 kb. To obtain cloi 
harbouring cDN A for further upstream sequence, 50,000 independent cloi 
of the same library were screened with the 450-bp Pvull- BamHl fragnu 
mapped in the 5' portion of the pCER217 insert. Of 44 positive clon 
plasmid pCER237 carried the longest insert (3.2 kb). A plasmid pCER2 
was assumed to be derived from an aberrant c-erb-B-2 mRNA which shar 
5' sequence with normal 4.6-kb mRNA, as described in the text. The MKh 
cDNA library (100,000 clones) was again screened with a 650-bp Smo 
Pvull restriction fragment derived from pCER235; 48 positive clones wi 
obtained, one of which, designated pCER204, contained the longest ins. 
(4.0 kb) and the restriction maps of its 5' one-third and 3' two-thirds wt 
identical to those of pCER235 and pCER217, respectively. B, Nitrocellulc 
filters containing poly(A) + RNAs from MKN-7 (lanes a, c, e) and hum 
placenta (lanes 6, a\ f) were hybridized with the 5' (lanes a, b), mid( 
(lanes c, d) and 3' probes (lanes <?, f) shown in A. Lanes b and d '< 
photographs obtained after longer exposure of the film (overnight exposi 
for lanes a, c; 2 weeks exposure for lanes 6, d). Poly(A) + RNA (2 u.g) frc 
MKN-7 cells was denatured with 50% formamide and 2.2 M formaldehy 
and applied to a 1% agarose gel containing 2.2 M formaldehyde 54 . RN 
on the gel were transferred to a nitrocellulose filter 35 , which was th 
hybridized for 16 h under stringent conditions (50% formamide, 4xSS 
42 °C) with the DNA probes. After hybridization, the filter was wash 
under stringent conditions as described elsewhere 1 . Under the hybridizati 
conditions used, no hybridization of the DNA probes with the EGF recep 
mRNA was observed. The DNA probes were labelled with [o- 32 P]dC 
(3,000 Ci mmol -1 ; Amersham) by nick-translation . 

(bp) KX DNA fragment prepared from a c-erb-B-2 gent 
clone and then with cDNA probes as described in Fig. 1 leg 
Among 1 13 positive clones, pCER204 carried the longest ii 
of 4.0 kb, whereas the c-erb-B-2 mRNA is 4.6 kb long. Restri* 
mapping of the positive clones having an insert of > 1.5 kl 
to the identification of two distinct classes of clones (Fig. 
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Fig. 2 The c-erb-B-2 
cDNA nucleotide 
sequence and predicted 
amino-acid sequence. 
Nucleotides are num- 
bered at both sides. 
Amino-acid sequence is 
numbered from the puta- 
tive signal peptide above 
the sequence. Black dot- 
ted bars indicate the 
putative transmembrane 
region. The AATAAA box 
is followed (14 bp down- 
stream) by the poly- 
adenylated 3' end of the 
mRNA. The nucleotide 
sequence from residues 1 
to 1,810 was from 
pCER235 and that from 
residues 1,583 to the 
extreme 3' end was from 
pCER217, except that the 
sequence of the 77-bp 
BamHl fragment (2,314- 
2,390) was from 
pCER237, as a sequence 
of 42 bp was apparently 
deleted in pCER217 (see 
Fig. 1 legend). The over- 
lapping nucleotide 
sequences (1,583-1,810) 
of pCER217 and 
pCER235 match. The 
extreme 3' sequence of 
pCER235, which was 
derived from a sequence 
of unknown origin caused 
by chromosomal translo- 
cation is not shown here. 
The nucleotide sequence 
was determined by the 
Maxam-Gilbert 37 pro- 
cedure and the dideoxy 
chain termination method 
in conjunction with bac- 
teriophage M13mpl9 
(refs 38, 39). 
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M^M tln8 that there are two s P ecies of c-erb-B-2 mRNA in 
cells. As cDNA clones represented by pCER235 have 
I DPP^ 0f ^ 23kb > the other cla ss of clones represented by 
I 4 flu 4 (or 217 and 237) was bought to be derived from the 
*Z a mRNA * The PCER235 insert has a sequence that corre- 
sponds to the 5' half, but not the 3' half, of the pCER204 insert, 
wggestmg that pCER235 was derived from mRNA which shares 
» sequence with the 5' portion of normal 4.6-kb mRNA. This 



possibility was demonstrated by Northern hybridization of 
MKN-7 RNA (Fig. IB). Po!y(A) + RNA from MKN-7 cells was 
probed with DNAs specific for different portions of the cDNA 
clones. Using the 3' probe, equivalent to the KX DNA probe 1 , 
we observed a single species of 4.6-kb mRNA, which is expressed 
at an elevated level in MKN-7 cells (50-fold relative to placenta). 
However, both the Pvu\\-BamH\ fragment (middle probe) and 
the Smal-Pvull fragment (5' probe) reacted with a 2.3-kb 
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Fig. 3 Ami no-acid sequence of c-erb-B-2. A, Alignment 
of the amino-acid sequences of c-erb-B-2 and EGF receptor 
(EGFR). Amino acids of the two proteins are numbered on 
the left. Identities in the sequences are marked by two dots 
between the two lines; predicted transmembrane regions 
are represented by dotted black bars; the possible AMinked 
glycosylation sites by wavy lines; horizontal lines indicate 
signal peptides (putative for c-erb-B-2); stars indicate cys- 
teine residues in the sequence: solid stars are common to 
the two proteins and the open stars are specific to c-erb-B-2, 
The major sites of threonine and tyrosine phosphorylation* 
of the EGF receptor or pp60 src are conserved in c-erb-B-2 
and are shown by open and closed triangles, respectively. 
The amino-acid residues corresponding to the extreme car- 
boxy terminus of pp60 src and \-erb-B protein are indicated 
by vertical arrows at positions 986 and 1,228, respectively. 
Boxed sequences show the cysteine clusters. B, Schematic 
illustration of the c-erb-B-2 protein. Amino-acid sequence 
homologies of the c-erb-B-2 protein and the EGF receptor 
are shown. T, threonine and Y, tyrosine are possible phos- 
phorylation sites. C, cysteine clusters (also boxed). — 0, 
Possible glycosylation sites. An open box at the amino 
terminus shows the putative signal peptide and that in the 
middle shows possible transmembrane sequence. The 
hydropathicity of the c-erb-B-2 sequence is also shown. 
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mRNA in addition to the 4.6-kb mRNA in MICN-7 cells, but 
with the 4.6-kb mRNA alone in placenta. The 2.3-kb mRNA 
was over-produced to about the same extent as the 4.6-kb 
mRNA. 

Thus, the nucleotide sequences of the cloned inserts of 
pCER235 and pCER217 (237 or 204) were assumed to represent 
the overall sequence of the c-erb-B-2 gene product. The entire 
nucleotide sequence of 4,480 bp obtained from pCER235, 
pCER237 and pCER217 is shown in Fig. 2. The longest open 
reading frame is composed of 3,765 nucleotides, whose trans- 
lated amino-acid sequence of 1,255 residues is also shown. The 
predicted initiation codon ATG is flanked by nucleotides that 



match Kozak's criteria 4 for a translation initiation site. A primal 
translation product of the c-erb-B-2 gene was calculated to ha\ 
a relative molecular mass (M r ) of 137,895. 

Previous analysis of the c-erb-B-2 genomic clone A 107 showc 
that the c-erb-B-2 gene product has a kinase domain that 
highly homologous with that of EGF receptor/ v-erb-B . t 
shown in Fig. 3A, the entire amino-acid sequences of the tv 
proteins are extremely similar and the hydrophilicity profile 
(Fig. IB) and secondary structure 6 (data not shown) predict' 
from the amino-acid sequence of the c-erb-B-2 protein are al 
similar to those predicted for the EGF receptor. A sequence 
22 amino-acid residues (654-675) is strongly hydrophobic a» 
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Fig. 4 Comparison of rat neu and 
human c-erb-B-2. The amino-acid 
sequence of c-erb-B-2 is compared 
with that of neu (kindly communi- 
cated by Dr R. A. Weinberg). Only 
non-identical amino acids are shown 
for neu. The amino acids for c-erb-B- 
2 are numbered above the sequence. 
The boxed sequence indicates the 
transmembrane domain. 
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could serve as a membrane-anchoring domain. This sequence, 
like those of other receptors 7 " 10 , is followed immediately by 
basic amino acids (Lys-Arg-Arg), which helps in the correct 
allocation of the protein at the cell surface. The first 21 amino- 
acid residues are also highly hydrophobic, which suggests that 
they represent a signal sequence for membrane glycoprotein. 
Eight possible sites of N-linked glycosylation were identified 
in the amino-terminal moiety of the c-erb-B-2 protein. These 
data indicate that the two proteins with similar sizes have trans- 
membrane topologies that resemble each other. Therefore, we 
tentatively conclude that the c-erb-B-2 protein is a receptor for 
an unknown growth factor. 

The sequence of the putative extracellular domain shows 44% 
homology with the ligand binding domain of the EGF receptor. 
A sinking similarity is the presence of two cysteine-rich regions, 
jtj which the spatial distribution of cysteine residues is virtually 
.Identical with that in the EGF receptor. The sequences of cys- 
'£! nC .k rS are rather Mrophilic (Fig. 35) and would facilii 
the generation of a specific conformation for signal trans- 
emission through intramolecular or intermolecular S-S bridges 
>« is suggested for other receptors for growth factors 8 " 10 . These 
fc goings also suggest that the extracellular domains of the c-erb- 

* h\nrU° Um and EGF rece P tor f orm similar configurations and 

Structurall y reIated Hgands. However, not only EGF but 

* blast? SU< ? aS tUm ° ur growth factor ( TGF )-& TGF -% fi bro- 

* ihd ni*?? th f actor ' erythropoietin, nerve growth factor, insulin 

iSvth Cd gr0Wth faCt ° r aI1 faiIed to activa *e kinase 

v(data not s a hown) rCSUmably intrinsi ° t0 the c - erb ' B - 2 Protein 

^bJtht!! qUe ? Ce of 260 amino acids (residues 727-986), includ- 
^Of the?n U i at , ed ATP ' bindin S site 1 1 of the cytoplasmic domain 
*Of the prr protein > is homologous with the kinase domain 
*onco ff Pn. rece Ptor (82% homology) and with retroviral 
tWfore ^ r ° duCts L of the src &™ly (25-40% homology 1 )- 
^•ctivitv P r ?- erb ' B ' 2 protein seems to have tyrosine kinase 
«ynthetic e ary ex P eriments usim 3 antiserum raised against 
^terminus f peptlde ° f 14 amino " acid residues at the carboxy 
iexhibits iZLlt 8 ' 2 show that the ^rb-B-2 protein gp!85 
|homo!o2v h fanasc aCtlvlty (T A et al > in Preparation). The 
ifev w oetween the two proteins decreases to 32% in the 



269 amino-acid residues at the carboxyl end. However, three 
major in vitro phosphorylation sites of the EGF receptor 12 are 
also conserved (tyrosine residues at 1,139, 1,222 and 1,248 of 
c-erb-B-2), indicating that the c-erb-B-2 protein could also be 
autophosphorylated. 

Another interesting feature of the c-erb-B-2 protein is the 
presence of a threonine residue at position 686, surrounded by 
basic amino-acid residues, which is equivalent to threonine 654 
of the EGF receptor on which protein kinase C-mediated phos- 
phorylation occurs 13 . Therefore, the c-erb-B-2 protein may be 
phosphorylated by protein kinase .C, which would play an 
important role in signal transmission as suggested for the EGF 
receptor. 

Comparison of the nucleotide sequences and deduced amino- 
acid sequences of the recently characterized rat neu (see accom- 
panying paper 14 ) and human c-erb-B-2 (our present results) 
reveal that the neu gene is the rat counterpart of the c-erb-B-2 
gene (see Fig. 4 for the amino-acid sequences). Surprisingly, 
only two amino acids in the kinase domain (residues 813 and 
817) differ between the two proteins, although we do not know 
whether the c-erb-B-2 gene of MKN-7 cells can transform NIH 
3T3 cells. The possible glycosylation sites of the neu product 
are located at positions corresponding to those of the c-erb-B-2 
product, except that no sequence corresponding to Asn-Asn- 
Thr-Thr (positions 124-127 of c-erb-B-2) is found in neu, sug- 
gesting that the mature c-erb-B-2 protein is as large as the neu 
gene product gpl85. 

Evidence is accumulating that amplification and over- 
expression of a proto-oncogene can cause cell transformation 
in vitro and can play a part in the neoplastic process of human 
tumours * °. Amplification and elevated expression of the EGF 
receptor gene have been observed in glioblastomas and in 
squamous carcinoma cell lines 21 " 24 . In contrast, amplification 
of the c-erb-B-2 gene has been seen in three human adenocar- 
cinomas: one salivary adenocarcinoma 1 , one mammary car- 
cinoma 2 and one MKN-7 gastric cancer cell line, suggesting 
that increased expression of the c-erb-B-2 gene provides a selec- 
tive advantage in the formation or the progress of tumours of 
epithelial cells. 

Because the cDNA clone pCER235 derived from the 2.3-kb 
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mRNA does not contain a signal for membrane anchoring of 
the predicted polypeptide (data not shown), the translation 
product of the 2.3-kb mRNA, corresponding to the sequence of 
the c-erfc-B-2 ligand-binding domain, should be secreted by 
MKN-7 cells. Similarly, a truncated EGF receptor sequence has 
been reported to be over-expressed concomitantly with the EGF 
receptor in A431 cells 7,25,26 , as a consequence of a strong aberra- 
tion of the chromosomes 27 . Production of the truncated receptor 
in MKN-7 cells may also be caused by chromosomal aberra- 
tion 40 . We cannot exclude the possibility that extracellular 
accumulation of truncated growth factor receptors is associated 
with the appearance of the transformed phenotype. It is also 
possible that they provide a growth advantage to the cells in 
culture, in which the growth factor receptors are over-expressed. 

We thank N. KJtamura, H. Okubo and S. Nakanishi for help 
in constructing the MKN-7 cDNA library, H. Kawano for 
technical assistance and S. Sasaki for help in preparing the 
manuscript. We also thank Y. Kaziro for critical reading of the 
manuscript. 
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Molecular studies indicate that chimpanzee and gorilla are the 
closest relatives of man (refs 1-7 and refs therein). The small 
molecular distances found point to late ancestral separations 2 * 4 ' 7 , 
with the most recent being between chimpanzee and man, as judged 
by DNA hybridization 3 ®. Kluge 9 and Schwartz 10 contest these 
conclusions: morphological characters group a chimpanzee-gorilla 
clade with the Asian ape orang-utan in Kluge's cladistic study and 
with an orang-utan-human clade in Schwartz's study. Clearly, 
extensive sequencing of nuclear DNA is needed to resolve by 
cladistic analysis the branching order within Hominoidea". 
Towards this goal, we are sequencing orthologues of the primate 
t/ri/-globin locus 12,13 . Here, we compare the newly completed 
sequences of orang-utan and rhesus monkey with human, chimpan- 
zee, gorilla, owl monkey, lemur and goat orthologues. Our findings 
substantially increase the evidence indicative of a human-chim- 
panzee-gorilla clade with ancestral separations around 8 to 6 Myr 
ago. We also verify that neutral hominoid DNA evolved at 
markedly retarded rates. 

The rj locus is one of five ancient ^-related globin genes 
linked in a cluster 5'-e-y-i7-6-0-3' that arose from tandem dupli- 
cations (200-100 Myr) 12 * 13 . This ancient rj gene was embryoni- 
cally expressed in early eutherians and persisted as a functional 
gene in artiodactyls, but became a pseudogene in proto-primates 
and was lost from rodents and lagomorphs. Previous work from 
this laboratory 13 established that the goat t; gene sequence 

§To whom correspondence should be addressed. 



Fig. 1 (Opposite) Aligned nuceotide sequences of seven primate ^ 
globin genes and the active goat 17 -globin gene. Total DNA was isolated! 
from orang-utan {Pongo pygmaeus) no. I liver (Yerkes Primate Centof 
and from a rhesus monkey {Macaca mulatto) blood sample (California 
Primate Center). DNAs were subjected to a series of limited E«>RJ 
digestions and size-selected fragments, 15-25 kilobases (kb), w« 
cloned into the A vector Charon 32 33 by the procedure described 3$ 
Slightom et al? 4 . Recombinant phage DNAs were packaged into phage 
capsids using the in vitro phage packaging procedure 35 . CharoniSl 
phage were plated on the recA~ Escherichia coli host ED8767 3 *an& 
screened using a 32 P-labelled 245-base-pair (bp) Auall-EcoRI fragmeS 
isolated from the y- globin cDNA clone pJW151 37 . The ^77- containing 
EcoRI fragments were subcloned into pBR322 (orang 7.0 kb, rheiitt 
10.0 kb). DNA sequencing was done using the chemical procedure 
described by Maxam and Gilbert 38 . The nucleotide sequences M 
human, chimpanzee and gorilla are from Chang and Slightom 16 , anS 
owl monkey and lemur are from Harris et at 12 . Only the 5 '-flanking to 
exon 2 of the lemur sequence is orthologous to the 7/ gene 1215 ; therefore 
only this part of the sequence is used in our analyses. The goat T?-globir< 
gene (goat e 11 gene) nucleotide sequence is from Shapiro et a^-ity 
nucleotide sequencing numbering system is based on the overall align- 
ment among these sequences. The complete nucleotide sequence fort& 
human ^77-globin gene (HUM A) is presented on the top line atf 
differences are given for the remaining sequences. Asterisks indicate tbt 
presence of gaps placed to minimize the number of geneic change! 
during descent 13 - 39 . The simian tf/r} genes are divided into three exoj 
separated by two introns each obeying the GT- AG splicing rule 40 . Lefjg 
^t/ 12 ' 15 extends only to the 3' end of exon 2 and has a defective intrw 
1 splice site. Intron 1 sequences are all 121 bp in length, while introt^ 
lengths vary between 841 to 877 bp. The 5'-flanking region of the orafl; 
tprj gene contains a 38-bp direct repeat (positions 205-244 and 245-2RJ 
common to hominoid 1^77 genes. Each repeat contains a CCAAT typ 
promoter element. The RNA polymerase II binding site (TATAA) bf 
diverged in the tp-q genes from that found in normal 0-type gl<$ 
genes 41 (note positions 299-304). As in the other primate i^t? genes tb 
initiation codon (INT positions 384-386) of orang and rhesus sbO 1 
defects that prohibit translation. All of the primate ^17 genes share 
terminator sequence TGA (positions 1,885-1,888) and have the caooj 
cal poly(A) addition signal 42 . With regard to the direct repeat at positiOf 
1,895-1,914 and 1,915-1,940, the 22-base gap (positions 1,895-l^Jf 
in orang and rhesus could be the result of either two independf 
deletions or a deletion of the 5' direct repeat in the stem catarrbttf 
followed by a reduplication in the stem hominines. Exons terming 
the ► sites. 



